CSE 255 Assignment 8
نویسندگان
چکیده
In this paper we train an L1-regularized linear support vector machine (SVM) to determine whether the sentiment of a movie review is positive or negative. We train and test on the movie review polarity dataset introduced by Pang and Lee, 2004 [2]. Classification accuracy of the linear SVM is improved through a series of experiments for various data preprocessing techniques and data transformations. Classification accuracy is found to be maximum on the 10 cross-validation folds after removing numerical entries and performing log odds weighting of terms. Our final linear SVM with per-example regularization cost c = 1.00 generates an 0.877 classification accuracy; this compares favorably to the 0.864 accuracy using subjectivity extracts (Pang and Lee, 2004) and the 0.905 accuracy using linguisitic knowledge sources (Ng et al, 2006 [4]).
منابع مشابه
CSE 255: Assignment 1 - Exploring Musical Tagging
We explore two predictive tasks: (i) a measure of tag probability, and (ii) identifying a minimum tag set for more meaningful music classification on a 100,000 song dataset joined across complementary databases from the 1 Million Song Dataset (“MSD”). We conclude that a tag set size of around 50 tags is most meaningful and report many of our findings/analysis based on the top 50 tags. Using lin...
متن کاملCSE 255 Assignment 2 Cuisine Prediction/Classification based on ingredients
In this paper, we consider different strategies for identifying the cuisine, given its ingredients. This project aims to explore what combination of ingredients is helpful in identifying a cuisine if the recipe is not given. This has been tackled as a problem of cuisine classification. We also explore different classification algorithms in tandem with approaches like taking combination of multi...
متن کاملCSE 255 Assignment 1: Helpfulness in Amazon Reviews
In this paper we consider models for predicting the helpfulness rating of Amazon book reviews. We examine features such as the review’s star rating, the length of the review text, the readability of the review text, and the amount of comparisons made in the review. We compare Support Vector Machine and Random Forests models both for regression and classification.
متن کاملCSE 255 Assignment 2 : Upvotes Prediction for Reddit Submissions
In this paper we consider models for predicting the number of upvotes on a reddit submission. We examine features such as the number of votes, number of comments, time of submission, upvote history of users, images, and subreddits of the submission. We compare Support Vector Regression, Linear Regression, and Gradient Boosting Regression models for predicting the number of upvotes.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015